尽管图像增强的最新进展,但现有方法仍然很难适应弱光和正常光图像的亮度和对比度。为了解决这个问题,我们提出了一种新型的2D直方直方图均衡方法。它假设强度发生和同时存在相互依赖,并通过在强度共发生的分布(2D直方图)上进行边缘化,从而导致强度发生的分布(1D直方图)。该方案更有效地改善了全局对比度,并减少了噪声扩增。2D直方图是通过将图像反射率中的局部像素值差异纳入密度估计中以减轻暗照明条件的不利影响的定义。超过500张图像用于评估,证明了我们的方法优于现有研究。它可以充分提高低光图像的亮度,同时避免正常光明图像中过度增强。
translated by 谷歌翻译
我们为基于语义信息(称为ConceptBeam的语义信息)提出了一个新颖的框架。目标语音提取意味着在混合物中提取目标扬声器的语音。典型的方法一直在利用音频信号的性能,例如谐波结构和到达方向。相反,ConceptBeam通过语义线索解决了问题。具体来说,我们使用概念规范(例如图像或语音)提取说话者谈论概念的演讲,即感兴趣的主题。解决这个新颖的问题将为对话中讨论的特定主题等创新应用打开门。与关键字不同,概念是抽象的概念,使直接代表目标概念的挑战。在我们的方案中,通过将概念规范映射到共享的嵌入空间,将概念编码为语义嵌入。可以使用由图像及其口语字幕组成的配对数据进行深度度量学习来构建这种独立的空间。我们使用它来桥接模式依赖性信息,即混合物中的语音段以及指定的,无模式的概念。作为我们方案的证明,我们使用与口语标题相关的一组图像进行了实验。也就是说,我们从这些口语字幕中产生了语音混合物,并将图像或语音信号用作概念指定符。然后,我们使用已识别段的声学特征提取目标语音。我们将ConceptBeam与两种方法进行比较:一种基于从识别系统获得的关键字,另一个基于声音源分离。我们表明,概念束明显优于基线方法,并根据语义表示有效提取语音。
translated by 谷歌翻译
字体或字体的样式通常与特定印象相关联,例如沉重,当代或优雅。这表明字体形状与其印象之间存在某些相关性。要了解相关性,本文意识到​​附近嵌入了字体及其印象的共享潜在空间。难度是附着在字体上的印象词往往非常嘈杂。这是因为印象词是非常主观和多样化的。更重要的是,一些印象词与字体形状没有直接相关,并且会扰乱共享潜空间的实现。因此,我们使用DepeSets来增强形状相关的单词并在训练共享潜空间时自动抑制形状无关的单词。具有大型字体 - 印象数据集的定量和定性实验结果表明,所提出的方法的共享潜在空间适当描述了相关性,特别是对于形状相关的印象词。
translated by 谷歌翻译
Taking into account background knowledge as the context has always been an important part of solving tasks that involve natural language. One representative example of such tasks is text-based games, where players need to make decisions based on both description text previously shown in the game, and their own background knowledge about the language and common sense. In this work, we investigate not simply giving common sense, as can be seen in prior research, but also its effective usage. We assume that a part of the environment states different from common sense should constitute one of the grounds for action selection. We propose a novel agent, DiffG-RL, which constructs a Difference Graph that organizes the environment states and common sense by means of interactive objects with a dedicated graph encoder. DiffG-RL also contains a framework for extracting the appropriate amount and representation of common sense from the source to support the construction of the graph. We validate DiffG-RL in experiments with text-based games that require common sense and show that it outperforms baselines by 17% of scores. The code is available at https://github.com/ibm/diffg-rl
translated by 谷歌翻译
Several techniques to map various types of components, such as words, attributes, and images, into the embedded space have been studied. Most of them estimate the embedded representation of target entity as a point in the projective space. Some models, such as Word2Gauss, assume a probability distribution behind the embedded representation, which enables the spread or variance of the meaning of embedded target components to be captured and considered in more detail. We examine the method of estimating embedded representations as probability distributions for the interpretation of fashion-specific abstract and difficult-to-understand terms. Terms, such as "casual," "adult-casual,'' "beauty-casual," and "formal," are extremely subjective and abstract and are difficult for both experts and non-experts to understand, which discourages users from trying new fashion. We propose an end-to-end model called dual Gaussian visual-semantic embedding, which maps images and attributes in the same projective space and enables the interpretation of the meaning of these terms by its broad applications. We demonstrate the effectiveness of the proposed method through multifaceted experiments involving image and attribute mapping, image retrieval and re-ordering techniques, and a detailed theoretical/analytical discussion of the distance measure included in the loss function.
translated by 谷歌翻译
辍学是神经网络培训中最受欢迎的正规化技术之一。由于它的力量和简单性,已经对辍学进行了广泛的分析,并提出了许多变体。在本文中,从信息几何学的角度来讨论辍学的几种属性。我们表明辍学使模型歧管变平,并且它们的正则化性能取决于曲率的量。然后,我们表明辍学基本上是对应于依赖Fisher信息的正则化,并支持了数值实验的结果。从不同的角度,对技术的这种理论分析有望极大地有助于理解仍处于起步阶段的神经网络。
translated by 谷歌翻译